Retrieving Arabic Printed Document: a Survey
نویسندگان
چکیده
This paper surveys some of the literature pertaining to searching and retrieving OCR’ed printed documents with emphasis on Arabic documents. It examines peculiarities of Arabic morphology, orthography, retrieval, word clustering, display, OCR, and error correction. The paper surveys existing evaluation test-beds for retrieval of Arabic OCR texts. Lastly, it concludes with possible directions for future research. Index Terms — Arabic, Information Retrieval, OCR, Morpholgoy, Othography, Error Correction.
منابع مشابه
Document Analysis And Classification Based On Passing Window
In this paper we present Document analysis and classification system to segment and classify contents of Arabic document images. This system includes preprocessing, document segmentation, feature extraction and document classification. A document image is enhanced in the preprocessing by removing noise, binarization, and detecting and correcting image skew. In document segmentation, an algorith...
متن کاملOff-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model
In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...
متن کاملPersian Printed Document Analysis and Page Segmentation
This paper presents, a hybrid method, low-resolution and high-resolution, for Persian page segmentation. In the low-resolution page segmentation, a pyramidal image structure is constructed for multiscale analysis and segments document image to a set of regions. By high-resolution page segmentation, by connected components analysis, each region is segmented to homogeneous regions and identifyi...
متن کاملArabic-document compression: A close look at group 3 international digital facsimile coding standards
Efficient bit-representation or compression of documents is an important issue in many applications. The amount of compression depends on the document contents such as written scripts, diagrams, tables, etc. The contents of the document determine the limit of this compression. In the CCITI" Recommendation T.4, 'Standardization of group 3 apparatus for document transmission', a modified Huffman ...
متن کاملAn Empirical Evaluation of Off-line Arabic Handwriting And Printed Characters Recognition System
Handwriting recognition is a challenging task for many real-world applications such as document authentication, form processing, historical documents. This paper focuses on the comparative study on off-line handwriting recognition system and Printed Characters by taking Arabic handwriting. The off-line Handwriting Recognition methods for Arabic words which being often used among then across the...
متن کامل